-
Notifications
You must be signed in to change notification settings - Fork 10
Refactor zstash #370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Refactor zstash #370
Conversation
|
Advantages:
Obstacles:
|
|
Other suggestions, from comments on #363:
Comments from @TonyB9000:
|
a8d0085 to
8fa5ed5
Compare
|
Currently going through commands systematically, moving state data into a This removes the need for global variables and makes it easier to get a "snapshot" of the parameters at any given point. |
|
@forsyth2 Excellent. This will certainly help keep things straight. |
|
@TonyB9000 They started as prototyping attempts, meaning I already had some code. So, they went under pull requests instead of issues. |
a11704a to
0bd9e64
Compare
0bd9e64 to
bd08c60
Compare
|
The first commit (bd08c60) is my refactor to use objects. The second commit (b04221a) is my debugging of the double-authentication issue. Using these commits, if I run the following on Perlmutter: mkdir zstash_globus_setup
cd zstash_globus_setup
mkdir zstash_demo; echo 'file0 stuff' > zstash_demo/file0.txt
rm ~/.globus-native-apps.cfg
# Check I'm logged into "NERSC Perlmutter" and "Globus Tutorial Collection 1" on globus.org
zstash create --verbose --hpss=globus://6c54cade-bde5-45c1-bdea-f4bd71dba2cc/~/manual_run zstash_demothen I get: |
|
To remove consents, to start fresh: |
zstash/globus.py
Outdated
| scopes = "urn:globus:auth:scope:transfer.api.globus.org:all[" | ||
| for ep_id in [remote_endpoint, local_endpoint]: | ||
| if check_endpoint_version_5(ep_id): | ||
| for ep_id in [globus_info.remote_endpoint, globus_info.local_endpoint]: | ||
| if check_endpoint_version_5(globus_info, ep_id): | ||
| scopes += f" *https://auth.globus.org/scopes/{ep_id}/data_access" | ||
| scopes += " ]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't do this scope expansion until after transfer_client gets set. It appears the two authentications are 1) initiating the native client and 2) adding the scopes of the local and remote endpoints to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@forsyth2 Although I'd never looked very hard, I had assume the first involved Globus recognizing the account as a valid Globus user, and then conducting ep-authentication/scoping. Surprised there are not 3 of these (globus, ep-1, ep-2)
I have gotten different results (even in the globus web UI) by when authenticating to endpoints in a different order.
| native_client = NativeClient( | ||
| client_id="6c1629cf-446c-49e7-af95-323c6412397f", | ||
| client_id=ZSTASH_CLIENT_ID, | ||
| app_name="Zstash", | ||
| default_scopes="openid urn:globus:auth:scope:transfer.api.globus.org:all", | ||
| ) | ||
| log_current_endpoints(globus_info) | ||
| logger.debug( | ||
| "globus_activate. Calling login, which may print 'Please Paste your Auth Code Below:'" | ||
| ) | ||
| native_client.login(no_local_server=True, refresh_tokens=True) | ||
| transfer_authorizer = native_client.get_authorizers().get("transfer.api.globus.org") | ||
| transfer_client = TransferClient(authorizer=transfer_authorizer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#349 has an initial implementation of how this code block might be replaced.
|
Reviewing the list of steps in #339:
Per #339, in the past, steps 2-4 weren't required at all. |
|
@TonyB9000 It looks like I have a working refactor; The first commit, bd08c60, passes the unit tests: In the second commit, b04221a, I try to debug the authentications issue. My current plan is to merge the refactor and the globus fixes as separate PRs (in any case, certainly as separate commits) -- but the refactor is needed as a new baseline for the globus fixes. We should meet to discuss the globus fixes further, but in the meantime you can take a look at that second commit and the comments I've made here. |
|
@forsyth2 Regarding "Reviewing the list of steps in #339:". If this was a "once-per-month" thing it would be tolerable for automation. I can understand different sites not "trusting one another" in terms of authentication (like, "I see you qualified as USER-at-site-1, so I will accept you as USER-at-site-2", so not to propagate a compromised account. (I never tried this, but in Globus Web, I suppose you could cycle through 5 different collections hosted at 5 different sites, satisfy the authentication at each one, and thereafter move from one collection to another at will. But they need to last longer. The obstacle to automation, as I see it, is that you need to authenticate to at least 3 parties (globus, party-1, party2) and you get knocked out by whoever has the shortest expiration. I agree with auto-deleting the globus config file. I would prefer the user never need to know such a file exists As far as zstash is concerned, I thought the first exercised transfer was to fetch (or remote-create) the index.db file, which can occur very fast (isn;t that what "zstash -ls" performs?) We can meet at your convenience. I am rebuilding conda environment for my dsm-testing. Major rewrite of the slurm/srun job-launching system to avoid hangs. (I did confirm that there are random hangs of job in a sequence, and would love to know why..) |
zstash/globus.py
Outdated
| "The {} endpoint is not activated or the current activation expires soon. Please go to https://app.globus.org/file-manager/collections/{} and (re)activate the endpoint.".format( | ||
| ep_id, ep_id | ||
| ) | ||
| f"The {ep_id} endpoint is not activated or the current activation expires soon. Please go to https://app.globus.org/file-manager/collections/{ep_id} and (re)activate the endpoint." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can infer what the endpoints are going to be later on in the code, based on the current machine and the hpss path (using Mache maybe?).
Then, we could check endpoint activation earlier, but even then, looking at this code block, we'd need transfer_client set as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'd need transfer_client set as well.
It turns out we don't! check_endpoint_version_5 needs it to be set, but we actually don't need to call that function if the consents are working fine in the first place.
zstash/globus.py
Outdated
| "submit_transfer_with_checks. Calling login, which may print 'Please Paste your Auth Code Below:'" | ||
| ) | ||
| native_client.login(requested_scopes=scopes) | ||
| # Quit here and tell user to re-try |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "Consents added, please re-run the previous command to start transfer" is a real obstacle to users. It basically means they have to run a toy version of zstash first, as specified in #329 and #339. We need to at least try to immediately alert the user that consents aren't set up rather than waiting until we try to transfer data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. There should be a globus function specific to this issue..
def has_consent(endpoint, scopes):
. . .
return [True, False]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@forsyth2 Without doing "damage", is there a way we could code a module (or function) that emulates "has_consent()" above, going as far as obtaining a transfer client (for a "toy" transfer, like "ls") and hides the details from the user?
(I don't really understand what is gained by registering for a "client_id". Is this intended to shorten the authentication steps?) If I write an application that has no registered client_id, can I still use the globus_sdk to code up functioning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm trying to do some prototyping of what that would look like.
044485a to
96677fa
Compare
forsyth2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TonyB9000 I added another commit, 96677fa, that mostly pulls out some code into helper functions.
| def set_clients(globus_info: GlobusInfo): | ||
| native_client = NativeClient( | ||
| client_id=ZSTASH_CLIENT_ID, | ||
| app_name="Zstash", | ||
| default_scopes="openid urn:globus:auth:scope:transfer.api.globus.org:all", | ||
| ) | ||
| log_current_endpoints(globus_info) | ||
| logger.debug( | ||
| "set_clients. Calling login, which may print 'Please Paste your Auth Code Below:'" | ||
| ) | ||
| native_client.login(no_local_server=True, refresh_tokens=True) | ||
| transfer_authorizer = native_client.get_authorizers().get("transfer.api.globus.org") | ||
| globus_info.transfer_client = TransferClient(authorizer=transfer_authorizer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially tried to change out this logic with that in #349, but it didn't seem to change terribly much.
It actually also complicated things because it uses a different object type than the native_client instantiation that's now found in check_consents()
zstash/globus.py
Outdated
| set_clients(globus_info) | ||
| # Causes globus_sdk.services.auth.errors.AuthAPIError: | ||
| # ('POST', 'https://auth.globus.org/v2/oauth2/token', None, 400, 'Error', 'invalid_grant') | ||
| # check_consents(globus_info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to check consents as early as possible (i.e., as soon as we have a transfer_client) just results in an invalid_grant error, so that's no good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@forsyth2 At some point, we should rope in some of those helpful globus service folk, and ask why we cannot obtain this information independent of issuing a transfer. This seems like a "useful" feature to me.
zstash/utils.py
Outdated
| globus_cfg: str = os.path.expanduser("~/.globus-native-apps.cfg") | ||
| logger.info(f"Checking if {globus_cfg} exists") | ||
| if os.path.exists(globus_cfg): | ||
| logger.info(f"Removing {globus_cfg}") | ||
| # Otherwise, may cause "Token is not active" TransferAPIError | ||
| os.remove(globus_cfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to auto-remove that cfg that was causing problems, so that at least removes step 2 ("Delete existing globus cfg file") from the process.
|
Hmm the It looks like |
|
Even if I run the exact command the test is running ( |
|
@forsyth2 I've never looked at "test_globus.py", but (in principle) a test-module should be a driver that imports and exercises the functions of the module being tested, and not "re-code" the same functions (otherwise you are only testing the test-module.). (That may be easier said tghan done, of course). |
@TonyB9000 Yeah, I'm not exactly sure why it was done this way; I would guess to make sure Globus is set up / teared down correctly. Overall, zstash testing is pretty annoying because basically all of the functionality is in bash, but the unit tests wrap all that in Python. That can cause problems, e.g. from my earlier comment:
I'm almost wondering if the best way to test |
By that I mean, while
Perhaps these would be integration tests (#369) and true unit tests would remain in Python. The issue with true unit tests is that so much of |
|
@forsyth2 Could you name the folder and script for me? I have the branch "refactor-zstash" but don;t know where to look. I'll borrow from it when I get beck this afternoon. |
|
@TonyB9000 I simplified the script further, see 40a3d58 (now fewer than 200 lines). It's Minimal Example Scriptimport configparser
import os
import re
import shutil
from typing import Optional
from urllib.parse import ParseResult, urlparse
from fair_research_login.client import NativeClient
from globus_sdk import TransferAPIError, TransferClient, TransferData
from globus_sdk.response import GlobusHTTPResponse
# Minimal example of how Globus is used in zstash
# 1. Log into endpoints at globus.org
# 2. To start fresh, with no consents:
# https://app.globus.org/settings/consents > Manage Your Consents > Globus Endpoint Performance Monitoring > rescind all"
HSI_DIR = "zstash_debugging_20250415_v2"
# Globus-specific settings ####################################################
GLOBUS_CFG: str = os.path.expanduser("~/.globus-native-apps.cfg")
INI_PATH: str = os.path.expanduser("~/.zstash.ini")
ZSTASH_CLIENT_ID: str = "6c1629cf-446c-49e7-af95-323c6412397f"
NAME_TO_ENDPOINT_MAP = {
# "Globus Tutorial Collection 1": "6c54cade-bde5-45c1-bdea-f4bd71dba2cc", # The Unit test endpoint
"NERSC HPSS": "9cd89cfd-6d04-11e5-ba46-22000b92c6ec",
"NERSC Perlmutter": "6bdc7956-fc0f-4ad2-989c-7aa5ee643a79",
}
# Functions ###################################################################
def main():
base_dir = os.getcwd()
print(f"Starting in {base_dir}")
if os.path.exists(INI_PATH):
os.remove(INI_PATH)
if os.path.exists(GLOBUS_CFG):
os.remove(GLOBUS_CFG)
try:
simple_transfer("toy_run")
except RuntimeError:
print("Now that we have the authentications, let's re-run.")
# /global/homes/f/forsyth/.globus-native-apps.cfg does not exist. zstash will need to prompt for authentications twice, and then you will need to re-run.
#
# Might ask for 1st authentication prompt:
# Please paste the following URL in a browser:
# Authenticated for the 1st time!
#
# Might ask for 2nd authentication prompt:
# Please paste the following URL in a browser:
# Authenticated for the 2nd time!
# Consents added, please re-run the previous command to start transfer
# Now that we have the authentications, let's re-run.
os.chdir(base_dir)
print(f"Now in {os.getcwd()}")
assert os.path.exists(INI_PATH)
assert os.path.exists(GLOBUS_CFG)
simple_transfer("real_run")
# /global/homes/f/forsyth/.globus-native-apps.cfg exists. If this file does not have the proper settings, it may cause a TransferAPIError (e.g., 'Token is not active', 'No credentials supplied')
#
# Might ask for 1st authentication prompt:
# Authenticated for the 1st time!
#
# Bypassed 2nd authentication.
#
# Wait for task to complete, wait_timeout=300
print(f"To see transferred files, run: hsi ls {HSI_DIR}")
# To see transferred files, run: hsi ls zstash_debugging_20250415_v2
# Shows file0.txt
def simple_transfer(run_dir: str):
hpss_path = f"globus://{NAME_TO_ENDPOINT_MAP['NERSC HPSS']}/~/{HSI_DIR}"
if os.path.exists(run_dir):
shutil.rmtree(run_dir)
os.mkdir(run_dir)
os.chdir(run_dir)
print(f"Now in {os.getcwd()}")
dir_to_archive: str = "dir_to_archive"
txt_file: str = "file0.txt"
os.mkdir(dir_to_archive)
with open(f"{dir_to_archive}/{txt_file}", "w") as f:
f.write("file contents")
url: ParseResult = urlparse(hpss_path)
assert url.scheme == "globus"
if os.path.exists(GLOBUS_CFG):
print(
f"{GLOBUS_CFG} exists. If this file does not have the proper settings, it may cause a TransferAPIError (e.g., 'Token is not active', 'No credentials supplied')"
)
else:
print(
f"{GLOBUS_CFG} does not exist. zstash will need to prompt for authentications twice, and then you will need to re-run."
)
config_path: str = os.path.abspath(dir_to_archive)
assert os.path.isdir(config_path)
remote_endpoint: str = url.netloc
# Simulate globus_activate > set_local_endpoint
ini = configparser.ConfigParser()
local_endpoint: Optional[str] = None
if ini.read(INI_PATH):
if "local" in ini.sections():
local_endpoint = ini["local"].get("globus_endpoint_uuid")
else:
ini["local"] = {"globus_endpoint_uuid": ""}
with open(INI_PATH, "w") as f:
ini.write(f)
if not local_endpoint:
nersc_hostname = os.environ.get("NERSC_HOST")
assert nersc_hostname == "perlmutter"
local_endpoint = NAME_TO_ENDPOINT_MAP["NERSC Perlmutter"]
native_client = NativeClient(
client_id=ZSTASH_CLIENT_ID,
app_name="Zstash",
default_scopes="openid urn:globus:auth:scope:transfer.api.globus.org:all",
)
# May print 'Please Paste your Auth Code Below:'
# This is the 1st authentication prompt!
print("Might ask for 1st authentication prompt:")
native_client.login(no_local_server=True, refresh_tokens=True)
print("Authenticated for the 1st time!")
transfer_authorizer = native_client.get_authorizers().get("transfer.api.globus.org")
transfer_client: TransferClient = TransferClient(authorizer=transfer_authorizer)
for ep_id in [
local_endpoint,
remote_endpoint,
]:
r = transfer_client.endpoint_autoactivate(ep_id, if_expires_in=600)
assert r.get("code") != "AutoActivationFailed"
os.chdir(config_path)
print(f"Now in {os.getcwd()}")
url_path: str = str(url.path)
assert local_endpoint is not None
src_path: str = os.path.join(os.getcwd(), txt_file)
dst_path: str = os.path.join(url_path, txt_file)
subdir = os.path.basename(os.path.normpath(url_path))
subdir_label = re.sub("[^A-Za-z0-9_ -]", "", subdir)
filename = txt_file.split(".")[0]
label = subdir_label + " " + filename
transfer_data: TransferData = TransferData(
transfer_client,
local_endpoint, # src_ep
remote_endpoint, # dst_ep
label=label,
verify_checksum=True,
preserve_timestamp=True,
fail_on_quota_errors=True,
)
transfer_data.add_item(src_path, dst_path)
transfer_data["label"] = label
task: GlobusHTTPResponse
try:
task = transfer_client.submit_transfer(transfer_data)
print("Bypassed 2nd authentication.")
except TransferAPIError as err:
if err.info.consent_required:
scopes = "urn:globus:auth:scope:transfer.api.globus.org:all["
for ep_id in [remote_endpoint, local_endpoint]:
scopes += f" *https://auth.globus.org/scopes/{ep_id}/data_access"
scopes += " ]"
native_client = NativeClient(client_id=ZSTASH_CLIENT_ID, app_name="Zstash")
# May print 'Please Paste your Auth Code Below:'
# This is the 2nd authentication prompt!
print("Might ask for 2nd authentication prompt:")
native_client.login(requested_scopes=scopes)
print("Authenticated for the 2nd time!")
print(
"Consents added, please re-run the previous command to start transfer"
)
raise RuntimeError("Re-run now that authentications are set up!")
else:
if err.info.authorization_parameters:
print("Error is in authorization parameters")
raise err
task_id = task.get("task_id")
wait_timeout = 300 # 300 sec = 5 min
print(f"Wait for task to complete, wait_timeout={wait_timeout}")
transfer_client.task_wait(task_id, timeout=wait_timeout, polling_interval=10)
curr_task: GlobusHTTPResponse = transfer_client.get_task(task_id)
task_status = curr_task["status"]
assert task_status == "SUCCEEDED"
# Run #########################################################################
if __name__ == "__main__":
main()How to get latest code locallygit status
# Make sure there are no changes that could be wiped or cause merge conflicts when we switch branches
git remote -v
# Look for the one that is associated with "[email protected]:E3SM-Project/zstash.git"
# For me, that's "upstream"
git fetch upstream refactor-zstash
# Now do one of the following:
# git rebase upstream/refactor-zstash # Applies latest commits from the branch on GitHub.
# git reset --hard upstream/refactor-zstash # Resets local branch to exactly match the branch on GitHub. |
|
@forsyth2 I would have thought this would work: But then "examples" only shows "zstash_create_globus.py", So I will try your method... |
|
@TonyB9000 Not quite. You need an additional command. git fetch origin refactor-zstash
# This updates your local git's "knowledge" of what's on the branch "refactor-zstash" on your remote "origin" (i.e., what's on GitHub)
git checkout refactor-zstash
# This tells your local git to switch you to your existing local branch "refactor-zstash"Then either: git reset --hard origin/refactor-zstash
# This is how to tell git "hey, actually make my local branch match what's on GitHub"or git rebase origin/refactor-zstash
# This is how to tell git "hey, keep anything extra I added, but put any of my changes on top of the latest from GitHub" |
|
Alternatively, create a new branch based on the GitHub branch with: git fetch origin refactor-zstash
git checkout -b my-own-refactor-zstash origin/refactor-zstash |
|
@forsyth2 Thanks Ryan. When I used "git fetch origin refactor-zstash" and then "git checkout refactor-zstash", I used no "-b", so I assumed it knew I was not creating my own new branch, but rather copying the existing branch from remote. However, I have since applied Also Does it matter that I began all of this with a fresh "git clone" of zstash? |
|
@forsyth2 Worse: |
|
@TonyB9000 Just use |
|
@TonyB9000 Remarkably, this script works! (However, I should note it still uses the ScriptThoughts from the AI
After changing to use bracketed scope syntax:
|
forsyth2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TonyB9000 (also @golaz, if you're interested) This commit (9edaf6f) gets rid of the second authentication, but with two important caveats:
- It's still using the
fair_research_loginclient; I don't know if that's a huge deal. - The far bigger deal is that it now requires knowing all the scopes up front. That didn't sound too bad on paper, but:
- Globus just fails to produce an auth code at all if any scopes are unknown.
- To get the auth code to paste, you have to be authenticated into all machines, even ones not involved in the transfer. For example, for a LCRC Improv DTN->NERSC HPSS transfer, I had to be logged in not only to LCRC and NERSC, but also PNNL! That's obviously an annoyance and a complete blocker for anyone who doesn't have all 3 accounts. That leaves us with two options: a) Just accept the fact people are going to have to do a second auth paste on a toy run before doing a real run, b) add some sort of parameter for users to pass in their local endpoint too (we know the remote from the hpss path) or else deduce it somehow, perhaps using Mache.
My run
cd ~/ez/zstash
lcrc_conda
rm -rf build
conda clean --all --y
conda env create -f conda/dev.yml -n zstash-370-20250723
conda activate zstash-370-20250723
pre-commit run --all-files
python -m pip install .
cd ../
mkdir zstash_test370_v2
rm ~/.globus-native-apps.cfg
# globus.org > File Manager > select "LCRC Improv DTN", "NERSC HPSS"
# https://auth.globus.org/v2/web/consents > Manage Your Consents > Globus Endpoint Performance Monitoring > rescind all"
mkdir zstash_demo; echo 'file0 stuff' > zstash_demo/file0.txt
zstash create --hpss=globus://NERSC/~/manual_run_2025_07_23 zstash_demo
# Please paste the following URL in a browser:
# => UNKNOWN_SCOPE_ERROR
# client_id=6c1629cf-446c-49e7-af95-323c6412397f requested unknown scopes: ['https://auth.globus.org/scopes/08925f04-569f-11e7-bef8-22000b9a448b/data_access', 'https://auth.globus.org/scopes/de463ec4-6d04-11e5-ba46-22000b92c6ec/data_access']
#
# Comment out:
#r"theta.*\.alcf\.anl\.gov": "08925f04-569f-11e7-bef8-22000b9a448b",
#"ALCF": "de463ec4-6d04-11e5-ba46-22000b92c6ec",
rm -rf zstash_demo
rm ~/.globus-native-apps.cfg
# rm: cannot remove '/home/ac.forsyth2/.globus-native-apps.cfg': No such file or directory
#
# https://auth.globus.org/v2/web/consents > Manage Your Consents > Globus Endpoint Performance Monitoring > rescind all"
# No consents
cd ../zstash
pre-commit run --all-files
python -m pip install .
cd ../zstash_test370_v2
mkdir zstash_demo; echo 'file0 stuff' > zstash_demo/file0.txt
zstash create --hpss=globus://NERSC/~/manual_run_2025_07_23 zstash_demo
# Only need to paste one authentication, but you have to log into LCRC, NERSC, and PNNL to get it!
That's actually not necessary. The latest commit (747feb2) solves this issue. We're actually able to authenticate just once and continue on, no "toy run" needed, now! Also relevant: #370 (comment) on the topic of needing |
|
The Globus epic lists these 3 points:
These steps are outlined in #339. I believe today's commits resolve this issue.
That's the bulk of this refactor PR, organizing information into objects, and refactoring functions to be easier to comprehend. Before merging of course, I want to do some final code clean up.
This is the only one I'm uncertain about. We do have |
Actually, that was true on main too, so that can't be it. |
|
@forsyth2 Ryan, you've done outstanding work here. I read through the thread, and I'm eager to test this. I have plenty of actual transfers I need to perform (from NERSC_HPSS to LCRC_IMPROV_DTN.) Code-wise, I don't have anything I need to save, so I intend to do a fresh git-clone of zstash and conduct "zstash --check" as a means of pulling over an archive (and not continue with any "zstash extract" at that time). ASIDE: I was pulling two archives over using the globus web, but 25% in, a NERSC admin suspended the job because it continued to fail on a file presenting permission issues (FILE=zstash_check_20240818.log.gz, How are these used?) Just this moment, Wuyin Lin got back to me and reset the permissions (global read), So I will see if I can get the admin to resume that transfer. (QUESTION: If I already have a long-running transfer ongoing via globus-web, would that compromise a "zstash --check" test?) |
Thanks, great! I'm concerned we haven't solved the "tokens expire too soon" problem, so it would be good to check on that with a long transfer. And debug from there, if needed. Let me know if you have trouble using the code from this branch.
I don't think anything in
|
|
@forsyth2 I have a long-running transfer of "1pctCO2" and "abrupt-4xCO2" archives from NERSC to LCRC. But I have another ~27 archives to transfer (not all at once, but as needed). These would be different transfers - but involve the same endpoints (NERSC_HPSS to LCRC_IMPROV_DTN). So, if already authenticated to the endpoints, I was just thinking that it would not be a thorough test to use "zstash --check" to transfer another archive. It would be "a test", but we (more generally) want to test when credentials have "timed-out" (not "expired", but new authentication is required). I suppose both tests would be useful, so I should set up a run. I'll git-clone the zstash repo, checkout the branch ("refactor-zstash"), and attempt a transfer. |
Oh yes, that's true. We wouldn't be able to properly test the endpoint authentication if you have another transfer going on. |
|
@forsyth2 NVMD - I left off the "--" on help. (Feels weird that "zstash --version" fails (must use "zstash version") but "zstash help" and "zstash command help" are not accepted...) |
|
@TonyB9000, I'm responding to email with subject "Progress testing zstash-refactor" here because it is easier to write code in Markdown.
Yes. You've specified a cache and a HPSS archive. The assumption is you're running this command from a directory without an existing
Debugging# On branch refactor-zstash
git grep -n "Updated config using db. Now,"
# zstash/utils.py:129: f"Updated config using db. Now, maxsize={self.config.maxsize}, path={self.config.path}, hpss={self.config.hpss}, hpss_type={self.hpss_type}"
git grep -n "self.config.path = "
examples/zstash_create_globus.py:55: self.config.path = os.path.abspath(dir_to_archive)
# zstash/utils.py:72: self.config.path = abs_pathIn def set_dir_to_archive(self, path: str):
abs_path = os.path.abspath(path)
if abs_path is not None:
self.config.path = abs_path
self.dir_to_archive_relative = path
else:
raise ValueError(f"Invalid path={path}")git grep -n "\.set_dir_to_archive("
# zstash/create.py:176: command_info.set_dir_to_archive(args.path)
# zstash/extract.py:107: command_info.set_dir_to_archive(os.getcwd())
# zstash/ls.py:75: command_info.set_dir_to_archive(os.getcwd())
# zstash/update.py:110: command_info.set_dir_to_archive(os.getcwd())We're on def setup_extract(command_info: CommandInfo, arg_list: List[str]) -> argparse.Namespace:
# [...]
if args.cache:
command_info.cache_dir = args.cache
command_info.keep = args.keep
command_info.set_dir_to_archive(os.getcwd())
command_info.set_hpss_parameters(args.hpss, null_hpss_allowed=True)Let's look at what this was on the In # Class to hold configuration
class Config(object):
path: Optional[str] = None
hpss: Optional[str] = None
maxsize: Optional[int] = Nonegit grep -n "config\.path" zstash
# zstash/create.py:31: if config.path is not None:
# zstash/create.py:32: path: str = config.path
# zstash/create.py:34: raise TypeError("Invalid config.path={}".format(config.path))
# zstash/create.py:179: config.path = os.path.abspath(args.path)
# zstash/extract.py:207: logger.debug("Local path : {}".format(config.path))
# zstash/update.py:123: # config.path = os.path.abspath(args.path)
# zstash/update.py:189: logger.debug("Local path : {}".format(config.path))Again, we follow the def extract_database(
args: argparse.Namespace, cache: str, keep_files: bool
) -> List[FilesRow]:
# [...]
logger.debug("Running zstash " + cmd)
logger.debug("Local path : {}".format(config.path))
logger.debug("HPSS path : {}".format(config.hpss))
logger.debug("Max size : {}".format(config.maxsize))
logger.debug("Keep local tar files : {}".format(keep))It looks to me like You have: Was
Debugging# Back on branch refactor-zstash
git grep -n "Local path :" zstash
# zstash/extract.py:148: logger.debug(f"Local path : {command_info.config.path}")
# zstash/update.py:153: logger.debug(f"Local path : {command_info.config.path}")Following def extract_database(
command_info: CommandInfo, args: argparse.Namespace, do_extract_files: bool
) -> List[FilesRow]:
# [...]
logger.debug("Running zstash " + cmd)
logger.debug(f"Local path : {command_info.config.path}")
logger.debug(f"HPSS path : {command_info.config.hpss}")
logger.debug(f"Max size : {command_info.config.maxsize}")
logger.debug(f"Keep local tar files : {command_info.keep}")So, same point/question as above: Was
Debugginggit grep -n "Transferring file " zstash
# zstash/hpss.py:68: logger.info(f"Transferring file {transfer_word} HPSS: {file_path}")
# zstash/hpss.py:123: error_str: str = f"Transferring file {transfer_word} HPSS: {name}"It's an INFO line, so let's follow the first. def hpss_transfer(
command_info: CommandInfo,
file_path: str,
transfer_type: str,
non_blocking: bool = False,
):
# [...]
transfer_word: str
transfer_command: str
if transfer_type == "put":
transfer_word = "to"
transfer_command = "put"
elif transfer_type == "get":
transfer_word = "from"
transfer_command = "get"
else:
raise ValueError("Invalid transfer_type={}".format(transfer_type))
logger.info(f"Transferring file {transfer_word} HPSS: {file_path}")So, we must about to Let's go up a level. git grep -n "hpss_transfer("
# zstash/hpss.py:14:def hpss_transfer(
# zstash/hpss.py:157: hpss_transfer(command_info, file_path, "put", non_blocking)
# zstash/hpss.py:164: hpss_transfer(command_info, file_path, "get")Let's follow the def hpss_get(command_info: CommandInfo, file_path: str):
"""
Get a file from the HPSS archive.
"""
hpss_transfer(command_info, file_path, "get")git grep -n "hpss_get("
# zstash/extract.py:122: hpss_get(command_info, command_info.get_db_name())
# zstash/extract.py:518: hpss_get(command_info, tfname)
# zstash/hpss.py:160:def hpss_get(command_info: CommandInfo, file_path: str):
# zstash/ls.py:89: hpss_get(command_info, command_info.get_db_name())
# zstash/update.py:128: hpss_get(command_info, command_info.get_db_name())So, for In def open_tar_with_retries(
command_info: CommandInfo,
files_row: FilesRow,
args: argparse.Namespace,
cur: sqlite3.Cursor,
multiprocess_worker: Optional[parallel.ExtractWorker] = None,
) -> Tuple[str, tarfile.TarFile]:
# [...]
tfname: str = os.path.join(command_info.cache_dir, files_row.tar)
# [...]
if do_retrieve:
hpss_get(command_info, tfname)
if not check_sizes_match(cur, tfname):
raise RuntimeError(f"{tfname} size does not match expected size.")Let's compare to the def extractFiles( # noqa: C901
files: List[FilesRow],
keep_files: bool,
keep_tars: Optional[bool],
cache: str,
cur: sqlite3.Cursor,
args: argparse.Namespace,
multiprocess_worker: Optional[parallel.ExtractWorker] = None,
) -> List[FilesRow]:
# [...]
tfname = os.path.join(cache, files_row.tar)
# [...]
if do_retrieve:
hpss_get(hpss, tfname, cache)
if not check_sizes_match(cur, tfname):
raise RuntimeError(
f"{tfname} size does not match expected size."
)Let's dive into def hpss_get(hpss: str, file_path: str, cache: str):
"""
Get a file from the HPSS archive.
"""
hpss_transfer(hpss, file_path, "get", cache, False)And def hpss_transfer(
hpss: str,
file_path: str,
transfer_type: str,
cache: str,
keep: bool = False,
non_blocking: bool = False,
is_index: bool = False,
):
# [...]
url = urlparse(hpss)
# [...]
url_path = url.path
# [...]
path, name = os.path.split(file_path)
# [...]
globus_status = globus_transfer(
endpoint, url_path, name, transfer_type, non_blocking
)Ok, so on Let's go back to def hpss_transfer(
command_info: CommandInfo,
file_path: str,
transfer_type: str,
non_blocking: bool = False,
):
# [...]
url = urlparse(command_info.config.hpss)
# [...]
url_path: str = str(url.path)
# [...]
path, name = os.path.split(file_path)
# [...]
globus_status = globus_transfer(
command_info.globus_info,
endpoint,
url_path,
name,
transfer_type,
non_blocking,
)
# [...]So, we can see it's basically doing the same thing here. It seems like the confusion is the naming of Following Blame for So, it seems like a debugging line based on code debt at this point.... The actual functionality seems ok based on the above analysis. That is, "is the messaging messed up?" seems to be the case.
Your first Globus screenshot has That is, unless your home directory is |
|
@forsyth2 In every case, my directory is
(NOTE: I switched from The zstash command the script produced was:
|
|
@forsyth2 I want to reiterate, the "zstash check" command successfully pulled over the NERSC "index.db" file to the designated local cache directory (PWD)/test_cache/, but thereafter failed to transfer the tar files from the same NERSC path. The image in my email was a snapshot I took to demonstrate that the files are available (although I cannot tell what read-permissions they have). The first image was the Globus complaint as to some path I don't quite understand. The sequence "/lcrc/group/e3sm2/ac.wlin/E3SMv3/v3.LR.hist-xGHG-xaer_0201" has no place in this operation, unless something in the tar archives has a symlink to such a path. |
|
@forsyth2 FWIW, although "tar tvf" will reveal if a file is actually a symlink, the sqlite3 "files" table will not. That should not matter in this case, as "zstack check" was not given any specific files to consider. |
Extremely early, experimental draft of what a zstash refactor would look like. Specifically, the refactor would store as much state as possible in an object rather than passing around many variables (especially global variables).